Generalized Reinforcement Learning for Manipulation Skills – Combining Low-dimensional Bayesian Optimization with High-dimensional Motion Optimization
نویسندگان
چکیده
This paper addresses the problem of how a robot can autonomously improve a manipulation skill in an efficient and secure manner. Instead of using the standard reinforcement learning formulation where all objectives are defined in a single reward function, we propose a generalized formulation that consists of three components: 1) A known analytic cost function; 2) A black-box reward function; 3) A black-box binary success constraint. While optimization of the analytic cost function is inherently high-dimensional, in typical robot manipulation problems we may assume that the black-box reward and constraint only depend on a lower dimensional projection of the policy. With our formulation we can exploit this structure and propose a sample-efficient learning framework that iteratively improves the skill with respect to the objective functions under the condition that the success constraint is fulfilled. The analytic cost function is optimized with motion optimization methods over the high dimensional policy where the lower dimensional parameters are fixed. The black-box reward is optimized with constraint Bayesian optimization over the lowerdimensional parameter. During both improvement steps the success constraint is used to keep the optimization in a secure region and to clearly distinguish between motions that lead to success or failure. The learning algorithm is evaluated on simulated benchmark problems and real-world tasks like opening a door with a PR2.
منابع مشابه
Combined Optimization and Reinforcement Learning for Manipulation Skills
—This work addresses the problem of how a robot can improve a manipulation skill in a sample-efficient and secure manner. As an alternative to the standard reinforcement learning formulation where all objectives are defined in a single reward function, we propose a generalized formulation that consists of three components: 1) A known analytic control cost function; 2) A black-box return functio...
متن کاملCombining Trajectory Optimization, Supervised Machine Learning, and Model Structure for Mitigating the Curse of Dimensionality in the Control of Bipedal Robots
To overcome the obstructions imposed by high-dimensional bipedal models, we embed a stable walking motion in an attractive low-dimensional surface of the system’s state space. The process begins with trajectory optimization to design an open-loop periodic walking motion of the high-dimensional model and then adding to this solution, a carefully selected set of additional open-loop trajectories ...
متن کاملBayesian Optimization for Contextual Policy Search*
Contextual policy search allows adapting robotic movement primitives to different situations. For instance, a locomotion primitive might be adapted to different terrain inclinations or desired walking speeds. Such an adaptation is often achievable by modifying a relatively small number of hyperparameters; however, learning when performed on an actual robotic system is typically restricted to a ...
متن کاملLearning Dynamic Manipulation Skills under Unknown Dynamics with Guided Policy Search
Planning and trajectory optimization can readily be used for kinematic control of robotic manipulation. However, planning dynamic motor skills requires a detailed physical simulation, and some aspects of the task, such as contacts, are very difficult to simulate with enough accuracy for dynamic manipulation. Alternatively, manipulation skills can be learned from experience, allowing them to def...
متن کاملInjection Optimization for Heavy Duty Diesel Engine in Order to Find High Efficiency and Low NOx Engine Concept by Means of Quasi Dimensional Multi-Zone Spray Modeling
The purpose of this study is to investigate the effect of injection parameters on a heavy duty diesel engine performance and emission characteristics. In order to analyze the injection and spray characteristics of diesel fuel with employing high-pressure common-rail injection system, the injection characteristics such as injection delay, injection duration, injection rate, number of nozzle hole...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015